Putting pesticides on the map for pollinator research and conservation

Wild and managed pollinators are essential to food production and the function of natural ecosystems; however, their populations are threatened by multiple stressors including pesticide use. Because pollinator species can travel hundreds to thousands of meters to forage, recent research has stressed the importance of evaluating pollinator decline at the landscape scale. However, scientists’ and conservationists’ ability to do this has been limited by a lack of accessible data on pesticide use at relevant spatial scales and in toxicological units meaningful to pollinators. Here, we synthesize information from several large, publicly available datasets on pesticide use patterns, land use, and toxicity to generate novel datasets describing pesticide use by active ingredient (kg, 1997–2017) and aggregate insecticide load (kg and honey bee lethal doses, 1997–2014) for state-crop combinations in the contiguous U.S. Furthermore, by linking pesticide datasets with land-use data, we describe a method to map pesticide indicators at spatial scales relevant to pollinator research and conservation.


Background & Summary
With nearly 90% of flowering plant species benefiting from the services of pollinators to set seed and produce fruit, pollinators are an essential component of healthy and diverse ecosystems and contribute significantly to food production [1][2][3][4] . However, populations of both wild and managed pollinators are facing serious challenges 5 . Population declines have been documented in several bee and butterfly species [6][7][8] , including the Eastern population of the monarch butterfly (Danaus plexippus), as indicated by strong reductions in overwintering colony size since the mid-1990s 9 . U.S. beekeepers lose around a third of their managed honey bee colonies each year 10 . The causes of pollinator declines are multifaceted and somewhat distinct for different taxa, but current evidence suggests that wild bees, honey bees, and butterflies share at least two key stressors: habitat loss and pesticide exposure 5,11,12 . Habitat loss limits the food and nesting resources available to support pollinator populations, while exposure to pesticides can kill pollinators outright or lead to sublethal effects on behavior, immunity, and reproduction 5,[11][12][13][14] . Furthermore, use of herbicides may influence pollinators indirectly by reducing the availability of their food plants 15 .
In the past decade, researchers have made significant progress in developing models to predict pollinator abundance and ecosystem services as a function of the landscape. For wild bees, the 'Lonsdorf model' translates land cover to abundance of nest sites and seasonal floral resources (predicted based on expert opinion), and combines this with flight ranges to derive indices of bee abundance and pollination service on each cell on a landscape 16,17 ; the model has been adapted for honey bees as well 18 . For monarch butterflies, researchers recently developed a spatial model that simulates the annual cycle of the Eastern monarch population, identifying regions where conservation actions could enhance monarch population stability 19 . Despite the significant value of current pollinator models based on resource availability, they could be improved by incorporating patterns of pesticide use.
There have been three main obstacles to incorporating pesticide use into landscape-scale research on pollinator health. First, although the U.S. has a substantial amount of public data on pesticide use, pesticide toxicity, and land use, these data are distributed across disparate government databases, each with idiosyncratic Methods overall strategy. The aim of this project was to synthesize publicly available data on land use, pesticide use, and toxicity to generate a 'toolkit' of data resources enabling improved landscape-scale research on pesticide-pollinator interactions. The main outcomes are several novel datasets covering ten major crops or crop groups in each of the 48 contiguous U.S. states: I) Average application rate (kg/ha/yr) of >500 common pesticide active ingredients , II) Aggregate bee toxic load (honey bee lethal doses/ha/yr) of all insecticides combined (1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014), (Note that this dataset ends in 2014 because after that year, data on seed-applied pesticides were excluded 29 , and these contribute significantly to bee toxic load 21 ) III) Reclass tables relating these pesticide-use indicators to land use/land cover classes to enable the creation of maps predicting annual pesticide loading at 30-56 m resolution.
An overview of the steps, inputs, and outcomes are provided in Fig. 1. Table 1.

Data inputs. A summary of input datasets is provided in
Pesticide data. Pesticide use data were last downloaded from the USGS National Pesticide Synthesis Project 30,31 in June 2020. This dataset reports total kg applied of 508 common pesticide active ingredients by combinations of state, crop group, and year for the contiguous U.S. from 1992-2017 (crop groups explained in Table 2). The data are derived primarily from farmer surveys conducted by a private firm (Kynetec). For California, USGS obtains data from the state's pesticide use reporting program 32 . USGS then aggregates and standardizes both data sources into a common national dataset that is released to the public and was used in this effort. The USGS dataset includes both a 'high' and a 'low' estimate of pesticide use, varying based on the treatment of missing values in the source data 31 . Because previous work on this dataset suggested that the 'low' estimate more closely matches independent pesticide estimates 33 , we used the 'low' estimate throughout, but assess the influence of this choice on the resulting estimates (see Technical Validation). While we focus on the 'low' estimate for the data and outputs presented in this manuscript, the workflow we developed can accommodate both the low and high estimates.
Crop area data. To translate pesticide use estimates into average application rates, it was necessary to divide total kg of pesticide applied by the land area to which it was potentially applied. Crop area data were last downloaded from the Quick Stats Database of the USDA 34 in May 2020, using data files downloaded from the 'developer' page. This USDA dataset contains crop acreage estimates generated from two sources: the Census of Agriculture (Census), which is comprehensive but conducted only once every five years 35 and the crop survey conducted by the National Agricultural Statistics Service (NASS), which is an annual survey based on a representative sample of farmers in major production regions for a more limited subset of crops 36 .
Honey bee toxicity data. Translating insecticide application rates into estimates of bee toxic load (honey bee lethal doses/ha/yr) required toxicity values for each insecticide active ingredient in the USGS dataset. We used LD 50 values for the honey bee (Apis mellifera) because this is the standard terrestrial insect species used in regulatory procedures, and so has the most comprehensive data available. This species is also of particular concern as an important provider of pollination services to agriculture. As previously reported 21  Data preparation. Relating datasets. A major challenge in this data synthesis effort was relating the various data sources to each other, given that each dataset has unique nomenclature and organization. We created the following keys (summarized in Table 3) to facilitate joining datasets: Fig. 1 Overview of the data synthesis workflow described in this paper.
I) USGS-USDA crop keys -Using documentation and metadata associated with the USGS pesticide dataset 31,33,40 , we created keys relating the USGS surveyed crop names ('ePest' crops) and the ten USGS crop categories to the large number of corresponding crop acreage data items in the Census and NASS datasets. For annual crops and hay crops we used 'harvested acres, ' and for tree crops we used 'acres bearing & non-bearing. ' These choices were made to maximize data availability and to correspond as closely as possible to the crop acreage from which the pesticide data were derived 31 . A separate key was developed for California because California pesticide data derives from different source data and covers a larger range of crops. II) USGS-CASRN compound key -Using USGS documentation as well as background information on pesticide active ingredients 38,41 , we generated keys relating USGS active ingredient names to chemical abstracts service (CAS) registry numbers to facilitate matching compounds to the ECOTOX and PPDB databases. III) USGS compound-category key -In this key we classified active ingredients into major groups (insecticides, fungicides, nematicides, etc.) and into mode-of-action classes on the basis of information from pesticide databases and resistance action committees 38,41-44 . IV) USGS-USDA compound key -To facilitate our data validation effort, we generated a key relating USGS compound names to USDA compound names, on the basis of information from several pesticide databases 38,41 . V) USGS-CDL land use-land cover keys -Using documentation from the USGS pesticide dataset describing the crop composition of each of the ten crop categories 31 , we created a key that matches these categories to land cover classes in the CDL. A separate key was developed for California given the differences in surveyed crops in this state, noted above.
Processing crop area data. Because of differences in the crops included in pesticide use estimates, crop acreage data were processed separately for California and for all other states, and then re-joined, as follows: Acreage data were first filtered to include only data at the state level, reporting total annual acreage for states in the contiguous U.S. after 1996. Acreage data were joined to the appropriate USGS-USDA crop key and only those crops represented in the pesticide dataset were retained. We then generated an acreage dataset with single rows for each combination of crop, state, and year using data from the Census when available (1997,2002,2007,2012,2017), data from NASS in non-Census years, and temporal interpolation to fill in remaining missing values (i.e. linear interpolation between values in the same state and crop in the nearest surrounding years). This process was repeated for California, using acreage data for only that state in combination with the CA crop key. Finally, acreage data in the two datasets were recombined, converted to hectares, and summed by USGS crop group. www.nature.com/scientificdata www.nature.com/scientificdata/ Processing honey bee toxicity data. Processing for the honey bee toxicity data has been described in detail elsewhere 21 . Briefly, toxicity values were categorized as contact, oral, or other and standardized where possible into µg/bee. Records were retained if they represented acute exposure (4 days or less) for adult bees representing contact or oral LD 50 values in µg/bee. To generate a consensus list of contact and oral LD 50 values for all insecticides reported in the USGS dataset, we gave preference to point estimates and estimates generated through U.S. or E.U. regulatory procedures, taking a geometric mean if multiple such estimates were available. Unbounded estimates ("greater than" or "less than" some value) were only used when point estimates were unavailable, using the minimum (for "less than") or the maximum (for "greater than"). If values for a compound were unavailable in both datasets, we used the median toxicity value for the insecticide mode-of-action group. And finally, in rare cases (n = 1/148 compounds for contact toxicity and 8/148 compounds for oral toxicity) we were still left without a toxicity estimate for a particular insecticide. In those cases, we used the median value for all insecticides.

Data synthesis.
Compound-specific application rates for state-crop-year combinations. USGS data on pesticide application were joined to data on crop area. Average pesticide application rates were calculated by dividing kg applied by crop area (ha) for each combination of compound, crop group, state, and year.
Aggregate insecticide application rates for state-crop-year combinations. The dataset from the previous step was filtered to include only insecticides, and then joined to LD 50 data by compound name. Bee toxic load associated with each insecticide active ingredient was calculated by dividing the application rate by the contact or oral LD 50 value (µg/bee) to generate a number of lethal doses applied per unit area. These values were then summed across compounds to generate estimates of kg and bee toxic load per ha for combinations of crop group, state, and year.
Missing values were estimated using temporal interpolation, where possible (i.e. linear interpolation between values in the same state and crop group in the nearest surrounding years). This dataset ends in 2014 because after that year seed-applied pesticides were excluded from the source data 29 , and they constitute a major contribution to bee toxic load 21 .
We focused bee toxic load on insecticides for three reasons. First, quality of LD 50 data is highest for insecticides and uneven for fungicides and herbicides. Point estimates make up the majority of LD 50 values for insecticides, whereas < 25% of herbicide and fungicide LD 50 values are represented by a point estimate (i.e. a majority of these compounds have a best estimate of the form " > 100 µg/bee", increasing the uncertainty of downstream estimates). Second, insecticides tend to have greater acute toxicity toward insects than fungicides and herbicides (median [IQR] LD 50 = 100  µg/bee for fungicides, 100 [75-112] µg/bee for herbicides, and 1.36 [0.  µg/bee for insecticides). As a result, insecticides account for > 95% of bee toxic load nationally, even when herbicides and fungicides are included (and even though insecticides make up only 6.5% of pesticides applied on a weight basis). Third, focusing these values on insecticides increases their interpretability, reflecting efforts directed toward insect pest management, rather than a mix of insect, weed, and fungal pest management (which often have distinct dynamics and constraints for farmers).
While we chose to include only insecticides in this aggregate value, users are welcome to adjust the workflow to include fungicides and herbicides if desired. To this end, we provide our best estimates for LD 50 values for fungicides and herbicides in the USGS dataset (Table 4).

USGS category
Crops included in category Additional crops in California*  www.nature.com/scientificdata www.nature.com/scientificdata/ Reclassification tables. To generate reclassification tables for the CDL, the pesticide datasets described above were joined by crop group to CDL land use categories. The output of these processes was a set of reclassification tables for combinations of compound, state, and year. Also generated was a set of reclassification tables for aggregate insecticide use for combinations of state and year.
Of the 131 land use categories in the CDL, 16 represent two crops grown sequentially in the same year (double crops, found on ~2% of U.S. cropland in 2012 45 ), which required a modified accounting in our workflow. Pesticide use practices on double crops are not well described, but one study suggested that pesticide expenditures on soybean grown after wheat were similar to pesticide expenditures in soybean grown alone 46 . Therefore, we assumed that pesticide use on double crops would be additive (e.g. for a wheat-soybean double crop, the annual pesticide use estimate was generated by summing pesticide use associated with wheat and soybean).
Missing values in the reclassification tables resulted from several distinct issues. Some values were missing because a particular crop was not included in the underlying pesticide use survey (e.g. oats was not included in the Kynetec survey), or because the land use category was not a crop at all (e.g. deciduous forest). These two issues were indicated with values of '1' in columns called 'unsurveyed' and 'noncrop, ' respectively. For double crops, a value of 0.5 in the 'unsurveyed' column indicates that one of the crops was surveyed and the other was not. For compound-specific datasets, missing values may reflect that a given compound was not used in a state-crop group-year combination. For the aggregate insecticide dataset, even after interpolation there were some missing values, usually when a state had very little area of a particular crop or crop group.
Finally, missing data for double crops were treated slightly differently in the aggregate vs. compound-specific reclassification tables. For the aggregate insecticide dataset, estimates for double crops were only included if estimates were available for both crops; otherwise the value was reported as missing. For the compound-specific datasets, estimates for double crops were included if there was an estimate for at least one of the crops, since specific compounds may be used in one crop but not another.

Data Records
All data files generated in this work are available in .csv format on figshare 47 . A summary of output data files is provided in Table 4.

technical Validation
Validity of pesticide use estimates. To assess the validity of the dataset, we compared our estimates for specific compounds to those available from the USDA's Agricultural Chemical Use Program 48 for the period 1997-2017. While this dataset is not as comprehensive as the USGS dataset in terms of its coverage of states, years, crops, and active ingredients, the values that are available can be compared to USGS values for total kg applied and estimates of kg/ha/yr. An important limitation of this comparison is that the USDA dataset does not include seed-applied pesticides, affecting the estimation of certain fungicide and insecticide active ingredients 29 .
For combinations of state, crop, and active ingredient, the USDA survey reports several values that are relevant to this effort: total amount (kg) of pesticide applied, percent area treated, and application rate (kg/ha) on treated hectares. Application rates across the two datasets are not directly comparable because USDA reports the rate (kg/ac) applied on treated acres, whereas our dataset estimates the rate (kg/ha) averaged across all crop area in a state. To test the agreement between our estimates and the USDA estimates we compared: I) Total amount applied (kg), II) Application rate (kg/ha) calculated by dividing the USDA total kg by our estimate of crop area (ha) (Method 1), III) Application rate (kg/ha) derived indirectly by multiplying the USDA average application rate on treated acres by the percent of area treated (Method 2).
We focused on major field crops (corn, soy, wheat, cotton, rice) because they are the most frequently surveyed by USDA and have the closest one-to-one match between datasets. We suspect that estimates for fungicide and insecticide use may be more precise in high-value fruit and vegetable crops, where these pesticides are more widely used 25,27,28 . However, fruit and vegetable crops could not be included in the validation because estimates are incommensurable across the two datasets: USGS reports pesticide use for crop groups like 'orchards and  Table 3. Keys generated to relate datasets.
www.nature.com/scientificdata www.nature.com/scientificdata/ grapes' while USDA reports pesticide use on a small subset of individual crops in major production states (e.g. apples in Washington).
For each set of comparisons, we calculated and visualized relative percent difference between the two estimates, visualized the correlation between estimates, and calculated Spearman's and Pearson's correlation coefficients for each major pesticide class (fungicides, herbicides, insecticides).
Estimates for total amount applied and per-ha application rate were strongly correlated for fungicides and herbicides, with Pearson's correlation coefficients > 0.85 for all comparisons (Table 5, Fig. 2). Estimates for insecticides were well correlated on the basis of ranks (Spearman's rho = 0.85) but only weakly linearly related (Pearson's r = 0.20-0.38, Table 5, Fig. 2). This pattern was driven by malathion in cotton, which had very low estimates in the USGS dataset and fairly high estimates in the USDA dataset, for reasons unknown. It is possible that this discrepancy is related to use of malathion in boll weevil eradication 49 . Once these outliers were removed (n = 30 out of 1600 + observations), estimates for insecticides were well correlated for all three comparisons (Pearson's r > 0.75).
Among pesticide categories, the median relative difference between USGS-based datasets and USDA estimates was smallest for herbicides and largest for insecticides (Table 5). For fungicides, the median relative difference did not differ significantly from zero for any of the three comparisons, indicating a lack of consistent over-or under-estimation. For herbicides, median relative difference was not significantly different from zero for total kg or application rate (Method 1). However, it was significantly positive for application rate (Method 2), indicating a tendency for the USGS-based estimate to be slightly larger than the USDA-based estimate (median difference of 3%). For insecticides, median relative difference was significantly less than zero for all comparisons (−10-14%), indicating that the USGS-based estimates are conservative compared to the USDA estimates. Median relative difference decreased slightly when malathion outliers were removed (−8-12%), but were still significantly less than zero. This result may derive from using the USGS 'low' pesticide use values (see Sensitivity analyses). Importantly, relative difference between estimates declined with percent of crop area treated (Fig. 3), suggesting that estimates are more precise for active ingredients that are widely used. This pattern may help to explain why median relative difference between datasets was smallest for herbicides and largest for insecticides, given that field crops formed the basis of the validation. In field crops, herbicide use is very widespread while use of traditional insecticides and fungicides is more limited 28 . Since the mid-2000's the most widely applied insecticides in field crops have been neonicotinoid seed treatments 29,50 , which are excluded from the USDA survey and by extension, this comparison. Previous work 50 suggests that USGS estimates of neonicotinoid use in major field crops are consistent with independent estimates. Geographic coverage. To characterize the geographic coverage of our datasets, we calculated the percent of total land area and agricultural area in each state that is included in the underlying pesticide use survey, as well as the percent of agricultural area that is represented by a crop-specific estimate (vs. an estimate for a group of crops such as "vegetables and fruit"). To do this, we downloaded the CDL for all states in even years, calculated the area in each CDL category, joined the data to the CDL-USGS land use/land cover key, and aggregated the area depending on whether it was surveyed, unsurveyed, or non-agricultural. In most states, the survey covers > 95% of agricultural land (> 80% of agricultural land in all states, Fig. 4). States on the lower end of this range have significant area of regionally important yet unsurveyed crops (e.g. blueberries in Maine, cranberries in Massachusetts, grass seed in Oregon). The majority of crop area is associated with crop-specific pesticide estimates, although this is not true for all states (Fig. 4). Double crops contributed < 5% to agricultural area in most states, except North Carolina, Maryland, and Delaware, where they comprised 5-23% of agricultural area. As expected, survey  Table 4. Data outputs generated by this study. *This dataset is being provided for convenience of potential users, but was not used in downstream analyses here.
www.nature.com/scientificdata www.nature.com/scientificdata/ coverage of total land area was variable among states, ranging from < 10% in states dominated by forest (e.g. New Hampshire) or shrubland (e.g. Nevada), to > 80% in states with abundant cropland (e.g. Iowa).

Sensitivity analyses.
We conducted several additional analyses to quantify the sensitivity of our results to the methods employed in generating the dataset. To test the influence of using the low vs. the high USGS pesticide estimate, we regenerated the dataset using the high estimate and compared validation metrics. Consistent with previous analyses 33 , the high estimate from USGS tended to exceed the USDA value, with median relative differences ~30% for fungicides, 10-12% for herbicides, and 6-10% for insecticides. For all active ingredients combined, the median relative differences were 10-12% for the high estimate, versus -0.1-1% for the low estimate. The low estimate was therefore closer to the USDA value except in the case of insecticides, in which the low estimate underestimated the USDA value to a slightly greater degree than the high estimate overshot. However, correlations between USDA and USGS estimates were similarly strong regardless of whether the high or low estimate was used (Pearson's r > 0.75 for all comparisons), as was the correlation between high and low USGS estimates themselves (Pearson's r = 0.95). Both findings suggest that relative patterns in the data are robust.
Another concern related to the low vs. high USGS pesticide estimate is that missing data are more prevalent in the low dataset. Across the entire compound-specific dataset, 17% of crop-state-year-compound combinations that are present in the high dataset are absent in the low dataset (see https://land-4-bees.github.io/bee_ tox_index/validation_low_v_high.html). Missing data are most prevalent in alfalfa and pasture/hay (37% and 33%, respectively), land uses with generally low pesticide use intensity 26,51 . They are least prevalent in vegetables and fruit (7%), and orchards and grapes (8%), high-value crop types that tend to have high pesticide use intensity 26,27,52,53 . It is important to recognize that missing data may be 'real' , i.e. reflect combinations of crops, states, and years in which a given pesticide was not used. There is no simple way to know whether this is the case or that the data are missing due to limitations in the methodology generating the estimates (see further discussion from USGS 31 ). Given that the low estimate better matches independent estimates, we decided to focus our analyses on the low dataset. That said, the workflow we provide can be adapted to generate estimates from the high values or interpolate to fill missing values if that is desired (see code at https://land-4-bees.github.io/bee_tox_index).
To determine the potential uncertainty introduced by interpolating crop area when it was missing, we investigated the amount of total area contributed by interpolated values across the dataset. Across all years combined, interpolated area was a small percentage of most crops or crop groups (<4% of total area), with the exceptions of 'orchards and grapes' (76%), 'pasture and hay' (76%), and 'vegetables and fruit' (36%). Estimates for land area in these groups relied heavily on interpolation between Census years. We are confident in the interpolated values because they are mostly for perennial crops, so it is less likely that the acreage would change substantially in less than 5 years. That said, for the most accurate estimates of crop area, we recommend using the pesticide datasets matching Census years (1997,2002,2007,2012,2017).
We also investigated the influence of interpolated values for aggregate insecticide indicators. While interpolated values occupied a significant proportion of surveyed agricultural area (59%), they contributed < 1.5% to total insecticide load in kg or honey bee lethal doses. This pattern was driven by extensive interpolation for 'pasture and hay, ' a land use category that occupies significant land area but typically has very low insecticide  www.nature.com/scientificdata www.nature.com/scientificdata/ inputs. Conversely, land use categories with heavy insecticide inputs (e.g. cotton, 'vegetables and fruits' , 'orchards and grapes') had little to no interpolation.

Usage Notes
Functions to generate maps. One of the main goals of this project was to develop a methodology to generate maps of predicted pesticide use and bee toxic load. We provide two R functions (available in the 'beecoSp' package, https://github.com/land-4-bees/beecoSp) to reclassify land cover into predicted pesticide application rate (kg/ha) for combinations of active ingredients, states, and years, or to reclassify land cover into predicted bee toxic load (honey bee LD 50 /ha) for all insecticides combined. The 'reclasstables' function facilitates the creation of reclass tables for particular state-year combinations from the master files provided here, while the 'CDL_reclass' function uses such a reclass table to reclassify a land use raster, with the option to calculate the mean of the new raster. The inputs include an appropriate land cover raster file (from the CDL) and an appropriate master reclassification file from those accompanying this paper. Examples of the input file and output of maps for a particular active ingredient and for bee toxic load are shown in Fig. 5. Alternatively, for users interested in performing reclassification in another GIS program, reclassification tables may be generated from the datasets produced here or downloaded from the website described below.
insecticide Explorer web application. To enable users to explore insecticide use patterns and easily download reclassification tables for bee toxic load, we created an interactive website. The website allows users to generate graphs describing trends in national and state-level total and per ha insecticide use (in kg and honey bee lethal doses per year) from 1997-2014 ('Explore' tab). There is also a section of the website allowing download of reclassification tables for bee toxic load for particular state-year combinations ('Data' tab). The application can be accessed at the following URL: https://insecticideexplorer.shinyapps.io/insecticideexplorer/.
Limitations and possible adjustments. Users should be aware of several limitations of these data. First, the pesticide use estimates include only pesticides applied to agricultural land, and so do not account for pesticide applications on other land uses, i.e. urban and unmanaged areas. Similarly, for minor crops (e.g. fruits and  Relationships between the percent of cropland treated with a pesticide and the relative difference between the USDA pesticide use estimates and novel estimates generated from USGS data reported in this paper. Each point represents a combination of state, crop, year, and active ingredient. Relative difference = [(USGS -USDA)/average of the two values] × 100. Method 1 (M1) compared our estimate to one calculated by dividing the USDA total kg by our estimate of crop area (ha), and Method 2 (M2) compared our estimate to one calculated by multiplying the USDA average application rate on treated hectares by the percent of area treated. www.nature.com/scientificdata www.nature.com/scientificdata/ unsurveyed crops, or crops included in larger categories, the reclassification tables can be adjusted or supplemented to better represent pesticide use on those land cover classes.
Second, these estimates represent crop-state-year pesticide use averages, not field-specific application. We used aggregated data to predict finer scale pesticide use patterns because i) field-level pesticide use data are unavailable for the vast majority of the U.S., and ii) crop groups, states, and years capture significant variation in pesticide use [25][26][27][28] . That said, we acknowledge that the uncertainty associated with these estimates at the field level is unknown; there will be significant local variation not reflected in these estimates. Furthermore, the resulting maps are constrained by the quality of the underlying land cover data, which are most accurate for large-acreage crops and major production regions and least accurate for minor crops and minor production regions 54,55 . These estimates and associated maps are therefore most suited to applications in which the goal is to characterize or compare pesticide use patterns on a relatively broad spatial scale (e.g. comparing across multiple sites within www.nature.com/scientificdata www.nature.com/scientificdata/ or across states and years where landscapes vary in crop composition), and are not suitable for applications in which site-specific or field-scale accuracy is paramount (e.g. predicting pesticide use on an individual farm). We caution against inferring location-specific pesticide use from these data. It is still an open question how detailed pesticide use data need to be to elucidate pollinator-pesticide linkages; we hope the datasets we provide will enable research clarifying this issue.
Third, the temporal dimension of these estimates is limited by the source data, which are reported annually, and complete only through 2014, after which seed-applied pesticides are excluded. The annual nature of the data constrains their use in mechanistic risk assessments, which require more specific information about site, method, and time of application. A key issue in the exposure of pollinators will be the degree of temporal overlap of insecticide application and persistence with pollinator activity in and around contaminated matrices (e.g. soil for ground-nesting species, floral resources) 14 . Given the annual grain of these data, we expect them to best reflect the hazard to species that are active for a large portion of the year (e.g. honey bees, bumble bees) and for compounds with long systemic or residual activity (e.g. many neonicotinoid applications). The exclusion of seed treatments after 2014, and in California for all years, is problematic because seed-applied pesticides are increasingly prevalent in U.S. agriculture 29 and seed-applied neonicotinoids represent the most important contributor to national bee toxic load calculated based on oral toxicity 21,22 . We encourage users to investigate temporal www.nature.com/scientificdata www.nature.com/scientificdata/ patterns in use estimates for their target area to determine the most appropriate course to take for years after 2014. In some cases, projecting earlier use rates to later years -while clearly imperfect -may be the best available option until more valid public data becomes available.
There are also important limitations related to the 'bee toxic load' estimate for all insecticides combined. Most obviously, because they are based on honey bee LD 50 units, these estimates are most relevant to honey bees and closely related species. The literature suggests that broad patterns of toxicity are similar between honey bees and other bee species 56 , but there is significant variation among species, which is more pronounced when more distant insect taxa are considered 57,58 . If a quantitative estimate of lethal doses for some other bee species is desired, at minimum, users should consider adjusting for body weight (honey bee workers weigh ~120 mg each 59 ). Also, because these estimates are based on acute toxicity to adult honey bees, they do not reflect chronic toxicity or effects on other life stages such as larvae. It is up to the user to decide if toxic load on a contact-or oral-toxicity basis, or application rate of specific active ingredients, is most appropriate for a given application. Finally, because we summed across active ingredients, our estimate does not account for potential synergy between them.
And lastly, these estimates represent loading of pesticides to the landscape but not persistence or movement in the environment. These data could be combined with additional information to model fate in the environment and exposure of recipient organisms.

Code availability
All data processing was performed in the R statistical language 60 . Code is organized into a workflow that is documented on the following website: https://land-4-bees.github.io/bee_tox_index/.